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The INLA approach for approximate Bayesian inference for latent Gaussian models has 
been shown to give fast and accurate estimates of posterior marginals and also to be a valuable 
tool in practice via the R-package R-INLA. In this paper we formalize new developments in 
the R-INLA package and show how these features greatly extend the scope of models that 
can be analyzed by this interface. We also discuss the current default method in R-INLA 
to approximate posterior marginals of the hyperparameters using only a modest number of 
evaluations of the joint posterior distribution of the hyperparameters, without any need for 
I numerical integration. 
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The Integrated Nested Laplace Approximation (INLA) is an approach proposed by Rue et al. 



^ ( 2009 ) to perform approximate fully Bayesian inference on the class of latent Gaussian models 

^ (LGMs). INLA makes use of deterministic nested Laplace approximations and, as an algorithm 

c3 tailored to the class of LGMs, it provides a faster and more accurate alternative to simulation- 

based MCMC schemes. This is demonstrated in a series of examples ranging from simple to 



complex models in Rue et al. ( 2009 ) . Although the theory behind INLA has been well established 
in 



Rue et al. (2009), the INLA method continues to be a research area in active development. 



Designing a tool that allows the user the flexibility to define their own model with a relatively 



'Corresponding author. 
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easy to use interface is an important factor for the success of any approximate inference method. 
The R package INLA, hereafter refereed as R-INLA, provides this interface and allow users to 
specify and perform inference on complex LGMs. 

The breadth of classical Bayesian problems covered under the LGM framework, and therefore 
handled by INLA, is — when coupled with the user-friendly R-INLA interface — a key element 
in the success of the INLA methodology. For example, INLA has been shown to work well with 



generalized linear mixed models (GLMM) (Fong et al. , 2010), spatial GLMM (Eidsvik et al. 



2009), Bayesian quantile additive mixed models (Yue and Rue, 2011 ), survival analysis (Martino 



et al. , 2011b), stochastic volatility models (Martino et al. 2011a), generalized dynamic linear 



models (Ruiz-Cardenas et al. , 2011), change point models where data dependency is allowed 



within segments (Wyse et al. 2011), spatio-temporal disease mapping models (Schrodle and 



Held 2011), models to complex spatial point pattern data that account for both local and 



global spatial behavior (Illian et al. , 2011), and so on. 

There has also been a considerable increase in the number of users that have found in INLA 
the possibility to fit models that they were otherwise unable to fit. More interestingly, those 
users come from areas that are sometimes completely unrelated to each other, such as econo- 
metric, ecology, climate research, etc. Some examples are bi-variate meta-analysis of diagnostic 



studies (Paul et al. , 2010), detection of under-reporting of cases in an evaluation of veterinary 



surveillance data (Schrodle et al. 2011), investigation of geographic determinants of reported 



human Campylobacter infections in Scotland (Bessell et al. , 2010), the analysis of the impact of 



different social factors on the risk of acquiring infectious diseases in an urban setting (Wilking 



et al. , 2012), analysis of animal space use metrics (Johnson et al. , 2011), animal models used in 



evolutionary biology and animal breeding to identify the genetic part of traits (Holand et al. 



2011), analysis of the relation between biodiversity loss and disease transmission across a broad. 



heterogeneous ecoregion (Haas et al. , 2011), identification of areas in Toronto where spatially 
varying social or environmental factors could be causing higher incidence of lupus than would 



be expected given the population (Li et al. , 2011), and spatio-temporal modeling of particulate 



matter concentration in the North-Italian region Piemonte (Cameletti et al. , 2012). The rela- 
tive black-box format of INLA allows it to be embedded in external tools for a more integrated 



data analysis, e.g. Beale et al. (2010) mention that INLA has been used by tools embedded in 
a Geographical Information System (CIS) to evaluate the spatial relationships between health 
and the environment data. The model selection measures available in INLA in also something 
very much appreciated in the applied work mentioned so far, such quantities include marginal 



likelihood, deviance information criterion (DIG) ( Spiegelhalter et al. , 2002), and other predictive 
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measures. 



Some extensions to the work of Rue et al. (2009) has also been present in the Hterature, 



Hosseini et al. (2011) extends the INLA approach to fit spatial GLMM with skew normal priors 



for the latent variables instead of the more standard normal priors, S0rbye and Rue (2010) extend 
the use of INLA to joint inference and present an algorithm to derive analytical simultaneous 
credible bands for subsets of the latent field based on approximating the joint distribution of 



the subsets by multivariate Gaussian mixtures, Martins and Rue (2012) extend INLA to fit 
models where independent components of the latent field can have non-Gaussian distributions. 



and Cseke and Heskes (2011) discuss variations of the classic Laplace-approximation idea based 



on alternative Gaussian approximations (see also (Rue et al. , 2009, pp. 386-7) for a discussion 
on this issue). 



A lot of advances have been made in the area of spatial and spatial-temporal models, Eidsvik 



et al. (2011) address the issue of approximate Bayesian inference for large spatial datasets by 



combining the use of prediction process models as a reduced-rank spatial process to diminish 
the dimensionality of the model and the use of INLA to fit this reduced-rank models. INLA 



blends well with the work of Lindgren et al. (2011) where an explicit link between Gaussian 
Fields (GFs) and Gaussian Markov Random Fields (GMRFs) allow the modeling of spatio and 
spatio-temporal data to be done with continuously indexed GFs while the computations are 
carried out with GMRFs, using INLA as the inferential algorithm. 

The INLA methodology requires some expertise in numerical methods and computer pro- 
gramming to be implemented, since all procedures required to perform INLA need to be carefully 
implemented to achieve a good speed. This can, at first, be considered a disadvantage when 
compared with other approximate methods such as (naive) MCMC schemes that are much easier 
to implement, at least on a case by case basis. To overcome this, the R-INLA package was devel- 
oped to provide an easy to use interface to the stand-alone C coded inla program. To download 
the package one only needs one line of R code that can be found on the download section of the 



INLA website (http://www.r-iiila.org/). In addition, the website contains several worked 
out examples, papers and even the complete source code of the project. 

Unfortunately, when an interface is designed, a compromise must be made between simplicity 
and generality, meaning that in order to build a simple to use interface, some models that could 
be handled by the INLA method might not be available through that interface, hence not 
available to the general user. The first part of this paper will formalize some new developments 
already implemented on the R-INLA package and show how these new features greatly extend 



^The dependency on the stand-alone C program is the reason why R-INLA is not available on CRAN. 
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the scope of models available through that interface. 



In Rue et al. (2009) most of the attention was focused on the computation of the poste- 
rior marginal of the elements of the latent field since those are usually the biggest challenge 
when dealing with LGMs given the high dimension of the latent field usually found in models 
of interest. On the other hand, it was mentioned that the posterior marginal of the unknown 
parameters not in the latent field, hereafter refereed as hyperparameters, are obtained via nu- 
merical integration of an interpolant constructed with evaluations of the Laplace approximation 
of the joint posterior of the hyperparameters already computed in the computation of the pos- 
terior marginals of the latent field. However, details of such interpolant were not given, and the 
second part of this paper will show how to construct this interpolant in a very cost-effective way. 
Besides that we will describe an algorithm, the one currently in use in the R-INLA package, that 
completely bypass the need for numerical integration, providing accuracy and scalability. 

Section [2] will present an overview of the latent Gaussian models and of the INLA methodol- 
ogy. A number of new features already implemented in the R-INLA package will be formalized in 
Section [3] together with examples highlighting their usefulness. Section |4] will address the issue 
of computing the posterior marginal of the hyperparameters using a novel approach. 



2 Integrated Nested Laplace Approximation 



In Section [2. 1| we define latent Gaussian models using a hierarchical structure highlighting the 
assumptions required to be used within the INLA framework and point out which components 
of the model formulation will be made more flexible with the features presented in Section |3j 



Section 2.2 gives a brief description of the INLA approach and presents the task of approximating 



the posterior marginals of the hyperparameters that will be formalized in Section [4| A basic 



description of the R-INLA package is given in Section 2.3 and this is mainly to situate the reader 
when going through the extensions in Section [3} 



2.1 Latent Gaussian models 

The INLA framework was designed to deal with latent Gaussian models, where the observation 
(or response) variable m is assumed to belong to a distribution family (not necessarily part of 
the exponential family) where the mean /ij is linked to a structured additive predictor T]i through 
a link function g{-), so that g{fj.i) = rji- The structured additive predictor r]i accounts for effects 
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of various covariates in an additive way: 

n/ rip 

r]i = a + '^f'^^^Uji) + '^l3kZki + ei, (1) 
j=i k=i 

where {/^^HOI's 

are unknown functions of the covariates u, used for example to relax linear 
relationship of covariates and to model temporal and/or spatial dependence, the {/3fc}'s represent 
the linear effect of covariates z and the {ejj's are unstructured terms. Then a Gaussian prior is 
assigned to a, {f^^\-)}, {/3fc} and {ej}. 

We can also write the model described above using a hierarchical structure, where the first 
stage is formed by the likelihood function with conditional independence properties given the 
latent field x = [r], a, f , fi) and possible hyperparameters 6i, where each data point {yi,i = 
1, 71(^1 is connected to one element in the latent field Xj. Assuming that the elements of the 
latent field connected to the data points are positioned on the first rid elements of a;, we have 

Stage 1. y\x, Oi ~ TT{y\x, Oi) = n"=i 7r(yi|xi, Oi). 

Two new features relaxing the assumptions of Stage 1 will be presented in Section [3} Section 
13.11 will show how to fit models where different subsets of data come from different sources 



(i.e. different likelihoods) and Section 3.4 will show how to relax the assumption that each 
observation can only depend on one element of the latent field and allow it to depend on a linear 
combination of the elements in the latent field. 

The conditional distribution of the latent field x given some possible hyperparameters 62 
forms the second stage of the model and has a joint Gaussian distribution. 

Stage 2. x\e2 ~ ^(a;|6>2) = N{x- /x(6'2), Q"^(6'2)), 

where A/'(-; /i, Q^^) denotes a multivariate Gaussian distribution with mean vector /x and a pre- 
cision matrix Q. In most applications, the latent Gaussian field have conditional independence 
properties, which translates into a sparse precision matrix Q(62), which is of extreme impor- 
tance for the numerical algorithms that will follow. A multivariate Gaussian distribution with 



sparse precision matrix is denoted by Gaussian Markov Random Field (GMRF) (Rue and Held 



2005 ) . The latent field x may have additional linear constraints of the form Ax = e for an kxn 



matrix A of rank k, where k is the number of constraints and n the size of the latent field. Stage 
2 is very general and can accommodate an enormous number of latent field structures. Sections 
3.2[ 3.3 and 3.6 will formalize new features of the R-INLA package that gives the user greater 



fiexibility to define these latent field structure, i.e. enable them to define complex latent fields 
from simpler GMRFs building blocks. 
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The hierarchical model is then completed with an appropriate prior distribution for the 
hyperparameters of the model 6 = {61,02) 



Stage 3. ~ Tr{e). 



2.2 INLA methodology 



For the hierarchical model described in Section 2.1, the joint posterior distribution of the un- 
knowns then reads 



Tr{x,e\y) oc TT{e)'K{x\e)Y{T^{yi\xi,6) 



i=l 

■n-d 



oc7r(6>)|Q(6')|"/2 exp 



-x''Qie)x + J2^og{7riyi\xi,e)} 



2 

i=l 



and the marginals of interest can be defined as 

7r{xi\y) = j ■K{xi\e, y)iT{e\y)d9 i = 1, n 

'^i^j\y) = j T^{0\y)de^j j = l,...,m 

while the approximated posterior marginals of interest 7r{xi\y), i = l,..,n and 7r{9j\y), j = 
1, m returned by INLA has the following form 

H^i\y) = Y.^{x,\e^^\y)^{e^^)\y) ao^'^^ (2) 

k 

n{ej\y) = J my)de-, (3) 

where {Tt{6^''^\y)} are the density values computed during a grid exploration on 7r{6\y). 

Looking at [Q-Q], we can see that the method can be divided into three main tasks, firstly 
propose an approximation Tr{9\y) to the joint posterior of the hyperparameters TT{0\y), secondly 
propose an approximation 7r{xi\9,y) to the marginals of the conditional distribution of the latent 
field given the data and the hyperparameters 7r(xj|0, y) and finally explore 7r(0|y) on a grid and 
use it to integrate out 9 in Eq. ^ and O^j in Eq. Q. 

Since we don't have Tr{0\y) evaluated at all points required to compute the integral in Eq. Q 
we construct an interpolation I{9\y) using the density values {7r{9^''^\y)} computed during the 
grid exploration on 7r{9\y) and approximate ^ by 



n{e,\y) = / I{9\y)d9.,. (4) 
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Details on how to construct such interpolant were not given in Rue et aL ( |2009[ ). Besides the 
description of the interpolation algorithm used to compute Eq. Q , Section [4] will present a 
novel approach to compute Tt{6j\y) that bypass numerical integration. 

The approximation used for the joint posterior of the hyperparameters 'ir{0\y) is 

TT{x,e,y 



7r{e\y) oc 



7rG{x\0,y) 



x=x*(e) 



(5) 



where 'kg{x\6, y) is a Gaussian approximation to the full conditional of x obtained by matching 
the modal configuration and the curvature at the mode, and x*{6) is the mode of the full 
conditional for x, for a given 6. Expression ^ is equivalent to the Laplace approximation of 
a marginal posterior distribution (Tierney and Kadane, 1986), and it is exact if TT{x\y,0) is a 
Gaussian. 

For 7r(xj|0, y), three options are available, and they vary in terms of speed and accuracy. The 
fastest option, 'KG{xi\9^y), is to use the marginals of the Gaussian approximation T:G{x\6^y) 
already computed when evaluating expression ([s]). The only extra cost to obtain 'iTG{xi\6,y) is 



to compute the marginal variances from the sparse precision matrix of 7rG'(a;|0, y), see Rue et al. 



(2009) for details. The Gaussian approximation often gives reasonable results, but there can be 
errors in the location and/or errors due to the lack of skewness (Rue and Martino, 2007| >. The 
more accurate approach would be to do again a Laplace approximation, denoted by 'KLA{xi\9, y), 
with a form similar to expression ([s]) 

TT{x,9,y) 



TTLA{xi\0,y) oc 



TTGGixi\xi,0,y) 



(6) 



where 7rGG{xi\xi, 0, y) is the Gaussian approximation to Xi\xi, 6, y and * {xi, 0) is the modal 
configuration. A third option TTsLA{xi\0, y), called simplified Laplace approximation, is obtained 
by doing a Taylor expansion on the numerator and denominator of expression ([g]) up to third 
order, thus correcting the Gaussian approximation for location and skewness with a much lower 



cost when compared to iTLAixilOjy). We refer to Rue et al. (2009) for a detailed description of 
the Gaussian, Laplace and simplified Laplace approximations to TT{xi\6,y). 



2.3 R-INLA interface 

In this Section we present the general structure of the R-INLA package since the reader will 
benefit from this when reading the extensions proposed in Section [Sj The syntax for the R-INLA 
package is based on the built-in glm function in R, and a basic call starts with 

formula = y ~ a + b + a:b + c*d + f(idxl, modell, ...) + f(idx2, model2, ...) 
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where formula describe the structured additive hnear predictor described in Eq. ([T]). Here, y 
is the response variable, the term a + b + a:b + c*d hold similar meaning as in the built-in 
glm function in R and are then responsible for the fixed efi^ects specification. The f () terms 
specify the general Gaussian random efi'ects components of the model and represent the smooth 
functions {/^^'H-)} in Eq. Q. In this case we say that both idxl and idx2 are latent building 
blocks that are combined together to form a joint latent Gaussian model of interest. Once the 
linear predictor is specified, a basic call to fit the model with R-INLA takes the following form: 

result = inla (formula, data = data.frame(y, a, b, c, d, idxl, idx2) , 
family = "gaussian") 

After the computations the variable result will hold an S3 object of class "inla", from which 
summaries, plots, and posterior marginals can be obtained. We refer to the package website 
http : / / www . r- inla . org for more information about model components available to use inside 
the f () functions as well as more advanced arguments to be used within the inlaO function. 



3 Extending the scope of INLA 

This section formalizes several features available within the R-INLA package that greatly ex- 
tend the scope of models available. The features are illustrated with small examples that help 
to understand the usefulness of the features and to apply it through R code available in the 
appendix. 



3.1 Multiple likelihoods 

In many applications, different subsets of data may have been generated by different sources, 
leading us to be interested in models where each subset of data may be described by a different 
likelihood function. Here different likelihood functions might mean either a different family 
of distribution, as for example when a subset of the data follow a Gaussian distribution and 
the other follow a Poisson distribution or, the same family of distribution but with different 
hyperparameters, as for example when one subset of the data comes from a Gaussian distribution 
with unknown precision ri and the other from a Gaussian with unknown precision T2. 

Although being a very useful feature, models with multiple likelihoods are not straightfor- 
ward, if at all possible, to implement through many of the popular packages available in R. From 
a theoretical point of view there is nothing that keep us from fitting a model with multiple 
likelihoods with the INLA approach. The only requirements, as described in Section 2.1, are 
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that the hkehhood function must have conditional independence properties given the latent field 
X and hyperparameters Oi, and that each data-point yi must be connected to one element in 
the latent field Xi, so that 



7r{y\x,ei) = Y^T^iyilxi^Oi). 



i=l 



Even this last restriction will be made more flexible in Section 3.5 where each data-point yi may 
be connected with a linear combination of the elements in the latent field. 

Models with multiple likelihoods can be fitted through the R-INLA package by rewriting the 
response variable as a matrix (or list) where the number of columns (or elements in the list) are 
equal to the number of different likelihood functions. The following small example will help to 
illustrate the process. 

Example 1. Suppose we have a dataset y with 2n elements where the first n data points come 
from a binomial experiment and the last n data points come from a Poisson distribution. In 
this case the response variable to be used as input to the inlaO function must be written as 
a matrix with two columns and 2n rows where the first n elements of the first column hold 
the binomial data while the last n elements of the second column hold the Poisson data, and 
all other elements of the matrix should be filled with NA. Following is R code to simulate data 
following the description above together with R-INLA code to fit the appropriate model to the 
simulated data. 

n = 100 

xl = runif (n) 

etal = 1 + xl 

yl = rbinom(n, size = 1, prob = exp(etal)/(l+exp(etal))) # binomial data 

x2 = runif (n) 

eta2 = 1 + x2 

y2 = rpois(n, exp(eta2)) 

Y = matrix (NA, 2*n, 2) # need the response variable as matrix 
Y[l:n, 1] = yl # binomial data 

Y[l:n + n, 2] = y2 # poisson data 

Ntrials = c(rep(l,n), rep(NA, n)) # required only for binomial data 
XX = c(xl, x2) 
formula = Y ~ 1 + xx 

result = inla(f ormula, data = list(Y = Y, xx = xx) , 

fcimily = cC'binomial" , "poisson"), Ntrials = Ntrials) 
summary (result) 
plot (result) 
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3.2 Replicate feature 

The replicate feature in R-INLA allows us to define models where the latent field x contain 
conditional independent replications of the same latent model given some hyperparameters. 
Assume for example that zi and Z2 are independent replications from z\0 such that x = (zi, Z2) 
and 

7r{x\e) = TT{zi\9)7r{z2\e) (7) 

It is important to note here that although the process zi and Z2 are conditionally independent 
given they both convey information about 0. A latent model such as ([7|) can be defined in 
the R-INLA package using the replicate argument inside the f () function used to specify the 



random effect components as described in Section 2.3 
Example 2. Let us define the following AR{1) process 

xi~iV(0, {k{1 - (1)^))-^) 
Xi = (j)Xi_i + €i; ej ~ iV(0,/t"^), i 



,n 



with (p and k being unknown hyperparameters satisfying |(/)| < 1 and k > 0. Denote by r the 
marginal precision of the process, r = k(1 — 0^). Now assume two conditionally independent 
realizations zi and Z2 of the ^-R(l) process defined above given the hyperparameters 9 = {(p, r). 
We are then given a dataset y with 2n elements where the first n elements come from a Poisson 
with intensity parameters given by exp(zi) and the last n elements of the dataset come from a 
Gaussian with mean Z2. The latent model x = (zi, Z2) described here can be specified with a two 
dimensional index (i, r) where i is the position index for each process and r is the index to label 
the process. Following is the INLA code to fit the model we just described to simulated data with 
(f) = 0.5, K = \/2 and Gaussian observational precision Tobs = 3. Priors for the hyperparameters 
were chosen following the guide-lines described in Fong et al. ( |2010 ). Figure[T]show the simulated 



z = (zi,Z2) (solid line) together with posterior means and (0.025, 0.975)-quantiles (dashed line) 
returned by INLA. Figure [2] show the posterior distributions for the hyperparameters returned 
by INLA with a vertical line to indicate true values used in the simulation. 

n = 100 

zl = arima. sim(n, model = list(ar = 0.5), sd = 0.5) # independent replication 
z2 = arima. sim(n, model = list(ar =0.5), sd = 0.5) # from AR(1) process 
yl = rpoisCn, exp(zl)) 

y2 = rnormCn, mean = z2, sd = l/sqrt(3)) 

y = matrix(NA, 2*n, 2) # Setting up matrix due to multiple likelihoods 
y[l:n, 1] = yl 
y[n + l:n, 2] = y2 
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hyper . gaussian = listCprec = list (prior = "loggamma", # prior for Gaussian 

param = c(l, 0.2161))) # likelihood precision 
hyper. arl = listCprec = listCprior = "loggamma", # priors for the 

param = c(l, 0.2161)), # 'arl' model 

rho = listCprior = "normal", 

param = cCO, 0.3))) 
i = repCl:n, 2) # position index for each process 

r = repCl:2, each = n) # index to label the process 

formula = y ~ fCi, model = "arl", replicate = r, hyper = hyper. arl) -1 
result = inlaCf ormula, family = cC"poisson", "gaussian"), 
data = listCy = y, i = i, r = r) , 

control . family = listClistC), listChyper = hyper.gaussisin))) 
summary Cresult) 
plot Cresult) 




Figure 1: Replicate example: (a) Simulated z\ process (solid line) together with posterior means 
and (0.025, 0.975) quantiles returned by INLA (dashed line) (b) Simulated Z2 process (solid line) 
together with posterior means and (0.025, 0.975) quantiles returned by INLA (dashed line) 

□ 

3.3 Copy feature 

The formula syntax as illustrated in Section [2.3| allow us to have only one element from each 
latent model to contribute to the linear prediction specification. So that a model formulation 
such as 
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(a) ■ ■ ' ' "(b) • ■ • ' ■■ " (c) 

Figure 2: Posterior distribution for the hyperparameters in the repUcate example with a vertical 
line to indicate the true values used in the simulation, (a) Gaussian observational precision (b) 
Precision of the AR(1) latent model (c) lag-one correlation for the AR(1) process 

formula = y ~ f(idxl, modell, ...) + f(idx2, model2, ...) 

indicate that each data point m is connected to one linear predictor rji through a given link 
function g and that each rji is connected to one element of the random effect idxl and to one 
element of the random effect idx2. Unfortunately this is not always enough as illustrated in the 
example below. 

Example 3. Suppose our data come from a Gaussian distribution yi ~ N{r]i,T^^), i = I, ...,n, 
where the linear prediction rji assume the following form 

r]i = ai + biZi, (ai,6i) ~ iV2(0,Q~"^), 

where z represent here known covariates. The bi-variate Gaussian model N2{0,Q~^) is defined 
in R-INLA by f (i , model = "iid2d"). However, a definition like 

formula = y ~ f(i, model = "iid2d", ...) - 1 

does not allow us to define the model of interest where each linear predictor rji is connected to 
two elements of the bi-variate Gaussian model, which are Oj and bi in this case. To address this 
inconvenience the copy feature was created and our model formulation could be defined by 

formula = y ~ f(i, model = "iid2d", n = 2*n) + 
f(i.plus, z, copy = "i") 

with appropriate definitions for the indexes i and i .plus. Below is R code to simulate data and 
to fit the model described above with INLA. The data is simulated with observational precision 
T = 1 and bi-variate Gaussian distribution for the random-effects {ai,bi), i = 1,...,1000 with 
marginal precisions Ta = Tb = 1 for Oj and bi respectively, and correlation pab between and bi 
equal to 0.8. Figure [3] show the posterior marginals for the hyperparameters returned by INLA. 
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n = 1000 

Sigma = matrix(c(l, 0.8, 0.8, 1), 2, 2) 
z = rnorm(n) 

ab = rmvnormCn, sigma = Sigma) # require 'mvtnorm' package 

a = ab[, 1] 

b = ab[, 2] 

eta = a + b * z 

y = eta + rnorm(n, sd = 1) 

hyper. gaussian = listCprec = list (prior = "loggamma", 

param = c(l, 0.2161))) 
i = l:n # use only the first n elements (a_l, a_n) 
j = l:n + n # use only the last n elements (b_l, . . . , b_n) 
formula = y ~ f(i, model = "iid2d", n = 2*n) + 

f(j, z, copy = "i") - 1 
result = inla(f ormula, data = list(y = y, z = z, i = i, j = j) , 

fcunily = "gaussicin", 

control. data = list (hyper = hyper. gaussian)) 
summary (result) 
plot (result) 



□ 

Formally, the copy feature is used when a latent field is needed more than once in the model 
formulation. When using the feature we then create a (almost) identical copy of xs, denoted 
here by x*g, that can then be used in the model formulation as shown in Example [Sj In this 
case, we have extended our latent field from xs to x = {xs,x*g), where 7r{x) = Tr{xs)TT{x*g\xs) 
and 

Tr{x*s\xs,T) (xexp!^-^{x*s - xs)'^ix} - xs)^ (8) 

so that the degree of closeness between xs and x*^ is controlled by the fixed high precision r in 
Eq. Q, which has a default value of r = exp 15. It is also possible for the copied model to have 
an unknown scale parameter if), in which case 

Tr{x*s\xs,T,'il;) oc exp|^(a3^ - i;xsf{x*s - 4^x3)^ (9) 
3.4 Linear combinations of the latent field 

Depending on the context, interest might lie not only on posterior marginals of the elements 
in the latent field but also on linear combinations of those elements. Assume v are the linear 
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(a) 



(b) 





Figure 3: Posterior distribution for the hyperparameters in the copy example with a vertical 
line to indicate the true values used in the simulation, (a) Gaussian observational precision r 
(b) Marginal precision for a^, (c) Marginal precision for 6j, r;, (d) Correlation between aj and 
bi, Pab- 



combinations of interest, it can then be written as 



V 



Bx, 



where x is the latent field and S is a A; x n matrix where k is the number of linear combinations 
and n is the size of the latent field. The functions inla.make . lincomb and inla.make . lincombs 
in R-INLA are used to define a linear combination and many linear combinations at once, re- 
spectively. 

R-INLA provides two approaches for dealing with v. The first approach creates an enlarged 
latent field x = {x,v) and then use the INLA method as usual to fit the enlarged model. After 
completion we then have posterior marginals for each element of x which includes the linear 
combinations v. Using this approach the marginals can be computed using the Gaussian, Laplace 



or simplified Laplace approximations discussed in Section 2.2 The drawback is that the addition 
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of many linear combinations will lead to more dense precision matrices which will consequently 
slow down the computations. This approach can be used by defining the linear combinations of 
interest using the functions mentioned on the previous paragraph and using control . inla = 
list (lincomb. derived. only = FALSE)) as an argument to the inla function. 

The second approach does not include v in the latent field but perform a post-processing of 
the resulting output given by INLA and approximate v\6,y by a Gaussian where 

E^\e,y{'") = Bt^* and YaT^ie,y{v) = BQ*''b'' , 

in which fi* is the mean of best marginal approximation used for TT(xi\0,y) (i.e. Gaussian, 
Simplified Laplace or Laplace approximation) and Q* is the precision matrix of the Gaussian 
approximation -iTG{x\0,y) used in Eq. ([s]). Then approximation for the posterior marginals of 
V are obtained by integrating out in a process similar to Eq. ([2]). The advantage here is that 
the computation of the posterior marginals for v does not affect the graph of the latent field, 
leading to a much faster approximation. That is why this is the default method in R-INLA, but 
more accurate approximations can be obtained by switching to the first approach, if necessary. 

Example 4. Following is R code to compute the posterior marginal of a linear combination 
between elements of the AR{1) process of Example [2j More specifically, we are interested in 

Vl = 321,2 — 5Z1,4 
V2 = Zi^3 + 2zi,5, 

where Zij denote the jth element of the latent model Zj as defined in Example [2[ 

# define the linear combinations: 

# v_l = 3*z_{l,2} - 5*z_{l,4} 

# v_2 = z_-[l,3>+ 2*z_{l,5} 

Icl = inla. make. lincombCi = c(NA, 3, NA, -5)) 
names(lcl) = "Icl" 

lc2 = inla. make. lincombCi = c(NA, NA, 1, NA, 2)) 
names (lc2) = "lc2" 

# compute v_l and v_2 using the default method, 
result = inlaCf ormula, 

family = c("poisson", "gaussian") , 
data = listCy =y, i=i, r=r), 

control . family = list(list(), listChyper = hyper.gaussicin)) , 
lincomb = c(lcl, lc2)) 

# compute v_l and v_2 with the more accurate (and slow) approach. 
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result2 = inla(f ormula, 

family = cC'poisson" , "gaussian"), 
data = listCy =y, i=i, r=r), 

control . family = list(list(), list(hyper = hyper.gaussictn)) , 
lincomb = c(lcl, lc2) , 

control . inla = list (lincomb. derived. only = FALSE)) 



, , -4 -2 2 4 

(a) (b) 
Figure 4: Posterior distribution for the linear combinations computed with both methods de- 



scribed in Section 3.4, the sohd Hne represent the more accurate approach while the dashed line 
represent the faster one. (a) Posterior distribution for vi (b) Posterior distribution for V2- 

The code illustrates how to use both approaches described in this section and Figure [4] shows 
the posterior distributions for vi and V2 computed with the more accurate approach (solid line) 
and with the faster one (dashed line). We can see little difference between the two approaches 
in this example. We refer to the FAQ section on the R-INLA website for more information about 
defining multiple linear combinations at once. 



□ 



When using the faster approach, there is also an option to compute the posterior correlation 
matrix between all the linear combinations by using 

control. inla = list (lincomb. derived. correlation. matrix = TRUE) 

This correlation matrix could be used for example to build a Gaussian copula to approximate 



the joint density of some components of the latent field, as discussed in Section 6.1 of |Rue et al. 

(poool). 
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3.5 More flexible dependence on the latent fleld 



As mentioned in Section 2.1 , the INLA method in its original implementation allowed each data 
point to be connected to only one element in the latent field. While this is often the case, this 
assumption is violated, for example, when the observed data consists of area or time averages 
of the latent field. In this case, 

ili\x,6i 7r{yi\^aijXj,0i). (10) 



Assume A to be the matrix formed by the {ajj} elements in Eq. (10). We further assume that 
the dependence of the data on the latent field is "local" in the sense that most elements of A 
are zero. With this assumption everything stays Markovian and fast inference is still possible. 
This is defined in R-INLA by modifying the control . compute argument of the inla function as 
follows: 

inla( . . . , control . compute = list (A = A)) 
Internally, R-INLA add another layer in the hierarchical model 

T]* = Arj 

where r/* is formed by a linear combination of the linear predictor 77, but now the likelihood 
function is connected to the latent field through t}* instead of r}, 



y\x,ei ~ J|vr(yi|r?*,0i). 



i=l 

This is a very powerful feature that allow us to fit models with a likelihood representation given 



by Eq. (10) and besides it can even mimic to some extent the copy feature of Section 3.3, with 
the exception that the copy feature allow us to copy model components using an unknown scale 
parameter as illustrated in Eq. (|9]). This feature is implemented by also adding rj* to the latent 
model, where the conditional distribution for t]* has mean Ar] and precision matrix kaI where 
the constant ka is set to a high value, like ka = exp(15). In terms of output from inla, then 
{ri*,ri) is the linear predictor. 

To illustrate the relation between the A matrix and the copy feature we fit the model of 
Example [3] again but now using the feature described in this Section. Following is the R code: 

## This is an alternative implementation of the model in Exemiple 3 
i = 1: (2*n) 
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zz = c(rep(l, n) , z) 

formula = y ~ f(i, zz, model = "iid2d", n = 2*n) - 1 # Define eta 
I = Diagonal (n) 

A = cBinddjI) # Define A matrix used to construct eta* = A eta 
result = inla(f ormula, 

data = listCy = y, zz = zz, i = i) , 

family = "gaussian" , 

control. predictor = list (A = A)) 
suimnary (result) 
plot (result) 

In some cases it has shown useful to simply define the model using the A-matrix, by simply 
defining 77 as a long vector of all the different components that the full model consist at, and then 
putting it all together using the A matrix. The following simplistic linear regression example 
demonstrate the idea. Note that 771 is the intercept and r)2 is the effect of covariate x. 

n = 100 

X = rnorm(n) 

y = 1 + X + rnorm(n, sd = 0.1) 
intercept = c(l, NA) 
b = c(NA, 1) 
A = cbind(l,x) 

r = inla(y ~ -1 + intercept + b, family = "gaussian", 

data = list(A = A, y = y, intercept = intercept, b = b) , 
control. predictor = list (A = A)) 

3.6 Kronecker feature 

In a number of applications, the precision matrix in the latent model can be written as a 
Kronecker product of two precision matrices. A simple example of this is the separable space- 
time model constructed by using spatially correlated innovations in an AR{1) model: 

where is a scalar and e ~ A^(0, Q^^)- In this case the precision matrix is Q = Qar{i) ® Qe^ 
where is the Kronecker product. 

The general Kronecker product mechanism is currently in progress, but a number of special 
cases are already available in the code through the group feature. For example, a separable 
spatio-temporal model can be constructed using the command 

result = y ~ f(loc, model = "besag", 

group = time, control . group = list (model = "arl")) 
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in which every observation is assigned a location loc and a time time. At each time the points 
are spatiahy correlated while across the time periods, they evolve according to an AR{1) process; 



see for example Cameletti et al. (2012). Besides the AR{1) model, an uniform correlation matrix 
defining exchangeable latent models, a random- walk of order one (RWl) and of order two (RW2) 
are also implemented in R-INLA through the group feature. 



4 On the posterior marginals for the hyperparameters 

This Section starts by describing the grid exploration required to integrate out the uncertainty 
with respect to when computing the posterior marginals of the latent field. Besides it presents 
two algorithms that can be used to compute the posterior marginals of the hyperparameters 
with little additional cost by using the points of the joint density of the hyperparameters already 
evaluated during the grid exploration. 



4.1 Grid exploration 



The main focus in Rue et al. (2009 ) lies on posterior marginals for the latent field. In this context. 



7r{0\y) is used to integrate out uncertainty with respect to when approximating 7r{xi\y). For 
this task we do not need a detailed exploration of TT{9\y) as long as we are able to select good 
evaluation points for the numerical solution of Eq. Rue et al. (2009) propose two different 
exploration schemes to perform the integration. 

Both schemes require a reparametrization of 0-space in order to make the density more 
regular, we denote such parametrization as the z-parametrization throughout the paper. Assume 
= {9i, . . . , Om) S 7^™", which can always be obtained by ad-hoc transformations of each element 
of 9, we proceed as follows: 

1. Find the mode 6* of 7r{0\y) and compute the negative Hessian H at the modal configu- 
ration 



2. Compute the eigen-decomposition S = V A^^'^V'^ where S 

3. Define a new z-variable such that 

eiz) = e* + vh^i^z 



H 



The variable z 



[zi, ■ 



is standardized and its components are mutually orthogonal. 



At this point, if the dimension of 9 is small, say m < 5, Rue et al. (2009) propose to use the 



z-parametrization to build a grid covering the area where the density of Tr{0\y) is higher. Such 
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procedure has a computational cost which grows exponentiahy with m. It turns out that when 
the goal is iT{xi\y) a rather rough grid is enough to give accurate results. 



If the dimension of 6 is higher, Rue et al. ( 2009 ) propose a different approach, named CCD 



integration. Here the integration problem is considered as a design problem and, using the mode 
0* and the negative Hessian H as a guide, we locate some "points" in the m-dimensional space 
which allows us to approximate the unknown function with a second order surface (see Section 



6.5 of Rue et al. , 2009). The CCD strategy requires much less computational power compared 



to the grid strategy but, when the goal is Tr{xi\y), it still allows to capture variability in the 
hyperparameter space when this is too wide to be explored via the grid strategy. 

Figure [5] shows the location of the integration points in a two dimensional 0-space using the 
grid and the CCD strategy. 





(a) 



(b) 



Figure 5: Location of the integration points in a two dimensional 0-space using the (a) grid and 
(b) the CCD strategy 



4.2 Algorithms for computing 7i{6j\y) 

If the dimension of 6 is not too high, it is possible to evaluate Tr{9\y) on a regular grid and use the 
resulting values to numerical compute the integral in Eq. Q by summing out the variables 0-j. 
Of course this is a naive solution in which the cost to obtain m such marginals would increase 
exponentially on m. A more elaborate solution would be to use a Laplace approximation 



T^G{d-j\ej,y) 



(11) 
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where is the modal configuration of 7f(0_j|0j, y) and TTG{0-j\9j,y) is a Gaussian approxi- 
mation to TT{0^j\9j, y) built by matching the mode and the curvature at the mode. This would 
certainly give us accurate results but it requires to find the maximum of the (m— 1) dimensional 
function TT(6-j\6j,y) for each value of 9j, which again does not scale well with the dimension 
m of the problem. Besides that, the Hessian computed at the (numerically computed) mode of 
Tr{0^j\9j,y) was not always positive definite, which became a major issue. It is worth pointing 
out that in latent Gaussian models of interest, the dimension of the latent field is usually quite 
big, which makes the evaluation of Tr{6\y) given by Eq. ^ expensive. With that in mind, it 
is useful to build and use algorithms that uses the density points already evaluated in the grid 



exploration of Tr{6\y) as described in Section 4.1 Remember that those grid points already 



had to be computed in order to integrate out the uncertainty about 6 using Eq. so that 
algorithms that uses those points to compute the posterior marginals for would be doing so 
with little extra cost. 

4.2.1 Asymmetric Gaussian interpolation 

Some information about the marginals TT{9j\y) can be obtained by approximating the joint 
distribution Tr{6\y) with a multivariate Normal distribution by matching the mode and the 
curvature at the mode of 7r{0\y). Such Gaussian approximation for 7r{9j\y) comes with no 
extra computational effort since the mode 6* and the negative Hessian H of TT{9\y) are already 
computed in the numerical strategy used to approximate Eq. ^ as described in Section 4.1 



Unfortunately, 7r{9j\y) can be rather skewed so that a Gaussian approximation is inaccurate. 
It is possible to correct the Gaussian approximation for the lack of asymmetry, with minimal 
additional costs, as described in the following. 

Let z{6) = {zi{9), ...,Zm{0)) be the point in the 2;-parametrization corresponding to 0. We 
define the function f{0) as 

m 



where 



^) if z>0 



^'^^^ I exp(- if z<0. ^^^^ 

In order to capture some of the asymmetry of 7r(0|y) we allow the scaling parameters {(t^~^, cr^^), 
j = 1, . . . , m, to vary not only according the m different axis but also according to the direction, 
positive and negative, of each axis. To compute these, we first note that in a Gaussian density, 
the drop in log density when we move from the mode to zt 2 the standard deviation is —2. We 
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compute our scaling parameters in such a way that this is approximately true for all directions. 
We do this while exploring 7r(0|y) to solve Eq. ([2]), meaning that no extra cost is required. An 
illustration of this process is given in Figure |6| 




2a^ 2a^ 

Figure 6: Schematic picture of the process to compute the scaling parameters that determine 



the form of the asymmetric Gaussian function given by Eq. (13). The solid line is the log-density 



of the distribution we want to approximate, and the scaling parameters and cr^ are obtained 
accordingly to a —2 drop in the target log-density. 



Approximations for Tr{6j\y) are then computed via numerical integration of Eq. (12), which 
is easy to do once the scaling parameters are known. Figure [7] illustrates the flexibility of fj{z) 



in Eq. (13) for different values of a and a~^. 




Figure 7: Standard normal distribution (solid line) and densities given by Eq. (13) for different 
values of the scaling parameters (dashed lines). 

This algorithm was successfully used in the R-INLA package for a long time, and our ex- 
perience is that it gives accurate results with low computational time. However, we came to 
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realize that the multi-dimensional numerical integration algorithms available to integrate out 



O^j in Eq. (12) gets increasingly unstable as we start to fit models with higher number of hy- 
perparameters, resulting in approximated posterior marginals densities with undesirable spikes 
instead of smooth ones. This has lead us to look for an algorithm that gives us accurate and fast 
approximations without the need to use those multi-dimensional integration algorithms, and we 
now describe our proposed solution. 

4.2.2 Numerical integration free algorithm 

The approximated posterior marginals 'k{Oj\y) returned by the new numerical integration free 
algorithm will assume the following structure, 



iV(0,a|+), 0, >0 
iV(0,a|_), 0, <0 



and the question now becomes how to compute (t|^_, J = 1, ...,m without using numerical 



integration as in Section 4.2.1 The following lemma will be useful for that (Rue et al. 2009), 



Lemma 1. Let x = (xi, ~ A^(0,5]); then for all xi 

^ \ E{x-i\xi) J ^ ^11 

The lemma above can be used in our favor since it states that the joint distribution of as a 
function of 9i with 0_j evaluated at the conditional mean E{0-i\xi) behaves as the marginal of 
9i. In our case this will be an approximation since 6 is not Gaussian. 

For each axis j = l,...,m our algorithm will compute the conditional mean E{0^j\9j) as- 
suming 6 to be Gaussian, which is linear in 6j and depend only on the mode 9* and covariance 
5] already computed in the grid exploration of Section 4.1 and then use Lemma [T] to explore 
the approximated posterior marginal of 6j in each direction of the axis. For each direction of 
the axis we only need to evaluate three points of this approximated marginal given by Lemma 
[l| which is enough to compute the second derivative and with that get the standard deviations 



a J and required to represent Eq. (|14|). 



Example 5. To illustrate the difference in accuracy between the numerical integration free 
algorithm and the posterior marginals obtained via a more computationally intensive grid ex- 
ploration we show in Figure [8] the posterior marginals of the hyperparameters of Example [2] 
computed by the first (solid line) and by the latter (dashed line). We can see that we lose 
accuracy when using the numerical integration free algorithm but it still gives us sensible results 
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with almost no extra computation time while we need to perform a second finer grid exploration 
to obtain a more accurate result via the grid method, a operation that can take a long time 
in examples with high dimension of the latent field and/or hyperparameters. The numerical 
integration free algorithm is the default method to compute the posterior marginals for the 
hyperparameters. In order to get more accurate results via the grid method the user needs to 
use the output of the inla function into the inla.hyperpar function. For example, to generate 
the marginals computed by the grid method in Figure [8] we have used 

result .hyperpar = inla.hyperpar (result) 




(a) ■ ■ ■ (b) ■ ■ (c) 

Figure 8: Posterior distribution for the hyperparameters in the replicate example with a vertical 
line to indicate the true values used in the simulation. Solid line computed by the numerical 
integration free algorithm and dashed line computed via the more expensive grid exploration, 
(a) Gaussian observational precision (b) Precision of the AR(1) latent model (c) persistence 
parameter for the AR(1) process 

5 Conclusion 

The INLA framework has become a daily tool for many applied researchers from different areas 
of application. With this increase in usage came as well an increase in demand for the possi- 
bility to fit more complex models from within R. It has happened in a way that many of the 
latest developments have come from necessity expressed by the users. In this paper we have 
described and illustrated several new features implemented in the R package R-INLA that have 
greatly extended the scope of models available to be used within R. This is an active project that 
continues to evolve in order to fulfill, as better as possible, the demands of the statistical and 
applied community. Several case studies that have used the features formalized in this paper 
can be found in the INLA website. Those case studies treat a variaty of situations, as for ex- 
ample dynamic models, shared random-effects models, spatio-temporal models and preferential 
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sampling, and serve to illustrate the generic nature of the features presented in Section [3} 

Most of the attention in Rue et al. (2009) have been focused on the algorithms to compute 
the posterior marginals of the latent field since this is usually the most challenging task given 
the usual big size of the latent field. However, the computation of posterior marginals of the 
hyperparameters is not straightforward given the high cost to evaluate the approximation to the 
joint density of hyperparameters. We have here described two algorithms that have been used 
successfully to obtain posterior marginals of the hyperparameters by using the few evaluation 
points already computed when integrating out the uncertainty with respect to the hyperparam- 
eters in the computation of the posterior marginals of the elements in the latent field. 
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